Cross-Context News Corpus for Protest Event-Related Knowledge Base Construction
نویسندگان
چکیده
Abstract We describe a gold standard corpus of protest events that comprise various local and international English language sources from countries. The contains document-, sentence-, token-level annotations. This facilitates creating machine learning models automatically classify news articles extract event-related information, constructing knowledge bases enable comparative social political science studies. For each source, the annotation starts with random samples continues drawn using active learning. Each batch is annotated by two scientists, adjudicated an supervisor, improved identifying errors semi-automatically. found possesses variety quality are necessary to develop benchmark text classification event extraction systems in cross-context setting, contributing generalizability robustness automated processing systems. reported results will establish common foundation collection studies, which currently lacking literature.
منابع مشابه
Constructing an Annotated Corpus for Protest Event Mining
We present a corpus for protest event mining that combines token-level annotation with the event schema and ontology of entities and events from protest research in the social sciences. The dataset uses newswire reports from the English Gigaword corpus. The token-level annotation is inspired by annotation standards for event extraction, in particular that of the Automated Content Extraction 200...
متن کاملContext Specific Event Model For News Articles
We present a new context based event indexing and event ranking model for News Articles. The context event clusters formed from the UNL Graphs uses the modified scoring scheme for segmenting events which is followed by clustering of events. From the context clusters obtained three models are developed-Identification of Main and Sub events; Event Indexing and Event Ranking. Based on the properti...
متن کاملThai Broadcast News Corpus Construction and Evaluation
Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Thai broadcast news speech and text corpora. Specifications and conventions used in the transcription process are described in the paper. The speech corpus contains about 17 hours of speech data while the text corpus was...
متن کاملInternet as Corpus Automatic Construction of a Swedish News Corpus
This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text f...
متن کاملInternet as Corpus-Automatic Construction of a Swedish News Corpus
This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data intelligence
سال: 2021
ISSN: ['2096-7004', '2641-435X']
DOI: https://doi.org/10.1162/dint_a_00092